Skip to content

migrate os.listdir() to os.scandir() to increase performance#78

Merged
AyanSinhaMahapatra merged 1 commit into
aboutcode-org:mainfrom
xsuchy:scandir
Mar 6, 2025
Merged

migrate os.listdir() to os.scandir() to increase performance#78
AyanSinhaMahapatra merged 1 commit into
aboutcode-org:mainfrom
xsuchy:scandir

Conversation

@xsuchy

@xsuchy xsuchy commented Jan 17, 2025

Copy link
Copy Markdown
Contributor

@xsuchy

xsuchy commented Jan 17, 2025

Copy link
Copy Markdown
Contributor Author

I did very simple and trivial benchmark on walking through the kernel tree:

  • previously: 45 second
  • after this PR: 36 second

@xsuchy

xsuchy commented Jan 17, 2025

Copy link
Copy Markdown
Contributor Author

If this gets merged, I will open another PR for the remaining occurrences of listdir() in other functions.

As recommended in https://peps.python.org/pep-0471/

Signed-off-by: Miroslav Suchý <msuchy@redhat.com>
@xsuchy

xsuchy commented Jan 29, 2025

Copy link
Copy Markdown
Contributor Author

Any review comments?

@xsuchy

xsuchy commented Feb 19, 2025

Copy link
Copy Markdown
Contributor Author

@AyanSinhaMahapatra or @pombredanne can I kindly ask you for a review?

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xsuchy Thanks++ for the PR.

Thanks for the references, we should definitely move towards using scandir() allright in this context. I've tested out this branch with scancode-toolkit latest too, and everything looks good, and we do get some performance improvements there on the codebase collection step similarly as indicated by you, thanks for doing and including those too.

  1. Some more CI updates and other fixes were also added to main, could you rebase to latest main?
  2. Please go ahead and use scandir() in the other instances of listdir() use in commoncode too.

This looks ready otherwise, and apologies for the late review.

We might want to do the same in other places too which would have some effect on the performance, https://github.com/search?q=org%3Aaboutcode-org+os.listdir&type=code maybe this could be useful in extractcode/fetchcode/deltacode too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants